Search CORE

72 research outputs found

Psycholinguistik

Author: Broeder Daan
Drude Sebastian
Wittenburg Peter
Publication venue
Publication date: 01/01/2012
Field of study

5.1 Einführung in den Forschungsbereich Die Psycholinguistik ist der Bereich der Linguistik, der sich mit dem Zusammenhang zwischen menschlicher Sprache und dem Denken und anderen mentalen Prozessen beschäftigt, d.h. sie stellt sich einer Reihe von essentiellen Fragen wie etwa (1) Wie schafft es unser Gehirn, im Wesentlichen akustische und visuelle kommunikative Informationen zu verstehen und in mentale Repräsentationen umzusetzen? (2) Wie kann unser Gehirn einen komplexen Sachverhalt, den wir anderen übermitteln wollen, in eine von anderen verarbeitbare Sequenz von verbalen und nonverbalen Aktionen umsetzen? (3) Wie gelingt es uns, in den verschiedenen Phasen des Lebens Sprachen zu erlernen? (4) Sind die kognitiven Prozesse der Sprachverarbeitung universell, obwohl die Sprachsysteme derart unterschiedlich sind, dass sich in den Strukturen kaum Universalien finden lassen

MPG.PuRe

Hochschulschriftenserver - Universität Frankfurt am Main

Building a Disciplinary, World-Wide Data Infrastructure

Author: Almas Bridget M.
Arviset Christophe
Bartolo Laura
Broeder Daan
Genova Françoise
Law Emily
McMahon Brian
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 19/03/2017
Field of study

Sharing scientific data, with the objective of making it fully discoverable, accessible, assessable, intelligible, usable, and interoperable, requires work at the disciplinary level to define in particular how the data should be formatted and described. Each discipline has its own organization and history as a starting point, and this paper explores the way a range of disciplines, namely materials science, crystallography, astronomy, earth sciences, humanities and linguistics get organized at the international level to tackle this question. In each case, the disciplinary culture with respect to data sharing, science drivers, organization and lessons learnt are briefly described, as well as the elements of the specific data infrastructure which are or could be shared with others. Commonalities and differences are assessed. Common key elements for success are identified: data sharing should be science driven; defining the disciplinary part of the interdisciplinary standards is mandatory but challenging; sharing of applications should accompany data sharing. Incentives such as journal and funding agency requirements are also similar. For all, it also appears that social aspects are more challenging than technological ones. Governance is more diverse, and linked to the discipline organization. CODATA, the RDA and the WDS can facilitate the establishment of disciplinary interoperability frameworks. Being problem-driven is also a key factor of success for building bridges to enable interdisciplinary research.Comment: Proceedings of the session "Building a disciplinary, world-wide data infrastructure" of SciDataCon 2016, held in Denver, CO, USA, 12-14 September 2016, to be published in ICSU CODATA Data Science Journal in 201

arXiv.org e-Print Archive

Directory of Open Access Journals

Foundations of Modern Language Resource Archives

Author: Broeder Daan
Klein Wolfgang
Levinson Stephen
Romary Laurent
Wittenburg Peter
Publication venue
Publication date: 01/01/2006
Field of study

A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will call "language resource archives". They combine the duty of taking care of long-term preservation as well as the task to give access to their material to different user groups. Access here is meant in the sense that an active interaction with the data will be made possible to support the integration of new data, new versions or commentaries of all sort. Modern Language Resource Archives will have to adhere to a number of basic principles to fulfill all requirements and they will have to be involved in federations to create joint language resource domains making it even more simple for the researchers to access the data. This paper makes an attempt to formulate the essential pillars language resource archives have to adhere to

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

MPG.PuRe

Connecting Repositories to one Integrated Domain

Author: Christophe Blanchi
Daan Broeder
Peter Wittenburg
Publication venue: 'Pensoft Publishers'
Publication date: 01/01/2022
Field of study

Information is the new commodity in the global economy and trustworthy digital repositories will be the key pillars within this new ecosystem. The value of this digital information will only be realised if these repositories can be interacted with in a consistent manner and their data accessible and understandable globally. Establishing a data interoperability layer is the goal of the emerging domain of Digital Objects. When considering how to proceed with designing this interoperability layer, it is important to state that repositories need to be considered from two different perspectives:Repositories are a reflection of the institutions that make them operational (quality of service, skilled experts, accessible over many years, appropriate data management procedures).Repositories are computational services that provide a specific set of functions.Complicating the effort to make repositories accessible and interoperable across the global is that many existing repositories have been developed in the past decades using a wide range of heterogeneous technologies, organisation of data and functionality. Many of these repositories are their own data silos and not interoperable. It is important to realise that much money has been invested to build these repositories and therefore we cannot expect that they will make large changes without great incentives and funding. This heterogeneity is the core of the challenge in making digital information the new commodity in the emerging global domain of digital objects.This paper will focus on the functional aspects of repositories and proposes the FAIR Digital Object model as a core data model for describing digital information and the use of the Digital Object Interface Protocol (DOIP) to establish interoperable communication with all repositories independently of the respective technical choices. It is the conviction of this paper’s authors that this integration of the FDO model and DOIP with existing repositories can be performed with minimal effort and we will present examples that document this claim.We will present three examples of existing integration in this paper:An integration of B2SHAREA CORDRA repositoryIntegration of the DOBES archiveB2SHARE is a repository that has assigned Persistent Identifiers (PIDs) (Handles) to all of its digital files. It allows users to add metadata according to a unified schema, but also has the possibility for user communities to extend this schema. The API allows one to specify a Handle which then gives access to the metadata and/or the bit sequences of the DO. It should be noted that B2SHARE allows one to include a set of bit-sequences being linked with the Handle. The integration consists of building a proxy that would provide a DOIP interface to B2SHARE to streamline the integration of the data and metadata into a single DO. The development of the proxy was relatively simple and did not require any changes on behalf of the B2SHARE repository. CORDRA is a CNRI repository/registry/registration system that manages DO, assigns Handles to all its DOs and is accessible through DOIP. For all intents and purposes, it implements many of the features from the Digital Object Architecture.The integration of the two repositories enables copying files or movíng digital objects. In the case of copying files (metadata and bit sequences) from B2SHARE to CORDRA, for example, all functionality of the CORDRA service such as searching would become possible. Important is that in this case the PID record identifying the digital object in the B2SHARE repository would have to be extended to point to the alternative path, and the API of B2SHARE would have to offer the alternative access paths to a client. This latter aspect has not been implemented. Moving a DO from B2SHARE to CORDRA would result in changing the ownership of the PID and adding the updated information about the DO.This adaptation was not done yet, but since this archive has some special functionalities, it is interesting to discuss the way of adaptation which could be chosen. In the DOBES archive each bundle of closely related digital objects is assigned a Handle and also metadata is treated as a digital object, i.e., it has a separate Handle. For management reasons and especially for enabling different contributors to maintain control of access rights, a tree structure was developed to allow contributors to organise their data according to specific criteria and users to browse the archive in addition to execute searches on the metadata.While accessing archival objects is comparatively simple, the ingest/upload feature is more complex. It should be noted that the archive supports establishing a canonical tree of resources to define scopes for authorisation (define who has the right to grant access permissions, etc.), and facilitating lookup by supporting browsing according to understandable criteria. Therefore, depositors need to specify where in the tree the new resources should be integrated, and which initial rights are associated with them. After uploading the gathered information into a workspace, the archive carries out many checks in a micro-workflow: metadata is checked against vocabularies and partly curated, types of bit-sequences are checked and aligned with the information in the metadata, etc. An operation has been developed which is called gatekeeper to ensure a highly consistent archive despite the many (remote) people contributing to its content. Thus, the archive requires a set of 4 information units being specified:the set of bit-sequences to be uploaded,the metadata describing the bundle,the node to be used to organise the resources andthe initial rights where the default would be “open”.Adapting this archive to DOIP would imply that the proxy provides a set of operations such as “ingest a complex object”, “update metadata”, “add another bit-sequence to a specific object”, “get me the list of operations”, “give me the metadata”, etc. A client must be developed to do the front-end interaction with a user allowing them to specify the required information and to choose a suitable operation. Then the client would have to interact with the repository via DOIP by starting, for example, the gatekeeper as an external operation

ZENODO

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Opening Digitized Newspapers Corpora: Europeana\u27s Full-Text Data Interoperability Case

Author: Broeder Daan
Charles Valentine
Freire Nuno
Goosen Twan
Isaac Antoine
Manguinhas Hugo
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Publication date: 01/01/2019
Field of study

Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe\u27s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana\u27s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana\u27s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a "full-text profile" for the Europeana Data Model, which is being applied to Europeana\u27s newspaper corpus

VU Research Portal

Dagstuhl Research Online Publication Server

Foundation of a Component-based Flexible Registry for Language Resources and Technology

Author: Broeder Daan
Calzolari Nicoletta
Declerck Thierry
Hinrichs Erhard
Piperidis Stelios
Romary Laurent
Wittenburg Peter
Publication venue: European Language Resources Association (ELRA), Paris, France
Publication date: 01/01/2008
Field of study

Within the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs by refering to a shared vocabulary registered in data category registries as they are suggested by ISO

INRIA a CCSD electronic archive server

PUblication MAnagement

MPG.PuRe

A sustainable archiving software solution for The Language Archive

Author: Broeder Daan
Elbers Willem
Moreira André
Trilsbeek Paul
Publication venue
Publication date: 12/03/2015
Field of study

[Archive X] has been developing a language archiving solution for more than 15 years now. The software is not only aimed at archiving and access but also integrates with a range of exploitation tools. This in house built solution was created from the ground up, since at the time no mature open source repository solutions were available. The situation today is rather different, with several widely used repository system solutions available, including open source solutions that are maintained by communities of developers. Since [Archive X] is now in a situation where it needs to reduce the number of staff required for the maintenance of its archiving software, it was decided to develop a new system based on one of the widely used open source repository solutions such as Fedora Commons (1) or DSpace (2). In this paper we will describe the process of selecting the most suitable open source repository solution as the basis for [Archive X]. This includes the specification of the functional and technical requirements and their prioritization, as well as the evaluation of a number of repository solutions. This evaluation also includes an assessment of the long-term perspective of those solutions. None of the existing repository solutions can provide the complete minimal functionality that [Archive X] requires from its archiving software. This means that additional components or modules need to be developed or adapted from the current software, regardless of the chosen repository solution. Still, we expect that using an existing extensible repository system as a basis will be less costly in the long run. Several language archives, in particular those that serve as centers (3) within the CLARIN consortium, have already implemented different repository systems based on either DSpace or Fedora Commons. Their experiences and recommendations are also taken into account for the evaluation of the various options. The final decision on which repository system will form the basis of the new archiving software will be taken by the end of September 2014. The development of the new archiving software will then start soon after that and a production-ready version will need to be finished by October 2016 at the latest. (1) http://fedorarepository.org/ (2) http://www.dspace.org/ (3) https://centerregistry-clarin.esc.rzg.mpg.de

ScholarSpace at University of Hawai'i at Manoa

MPG.PuRe

Sustainability and Genericity of CLARIN Services in the Netherlands

Author: Broeder Daan
Fišer Darja
Odijk Jan
Witt Andreas
Publication venue
Publication date: 12/10/2022
Field of study

Based on the ten years that have elapsed since the start of the CLARIN- NL project and its follow-up CLARIAH-NL, this chapter offers an analysis of the sustainability and genericity of services created in the context of CLARIN in the Netherlands. Our focus is on search applications, for which we make a proposal for coming to a more efficient and sustainable approach not only in the Netherlands but also CLARIN-wide. We also offer a number of general recommendations for improving sustainability of infrastructure services

Utrecht University Repository

LREP: A Language Repository Exchange Protocol

Author: Broeder Daan
Declerck Thierry
Romary Laurent
Wittenburg Peter
Publication venue: HAL CCSD
Publication date: 01/05/2002
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceThe recent increase in the number and complexity of the language resources available on the Internet is followed by a similar increase of available tools for linguistic analysis. Ideally the user does not need to be confronted with the question in how to match tools with resources. If resource repositories and tool repositories offer adequate metadata information and a suitable exchange protocol is developed this matching process could be performed (semi-) automatically

INRIA a CCSD electronic archive server

Foundation of a Component-based Flexible Registry for Language Resources and Technolog

Author: Broeder Daan
Calzolari Nicoletta
Declerck Thierry
Hinrichs Erhard
Piperidis Stelios
Romary Laurent
Wittenburg Peter
Publication venue: HAL CCSD
Publication date: 28/05/2008
Field of study

International audienceWithin the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs by refering to a shared vocabulary registered in data category registries as they are suggested by ISO

INRIA a CCSD electronic archive server